Subtractive Initialization of Nonnegative Matrix Factorizations for Document Clustering
نویسندگان
چکیده
Nonnegative matrix factorizations (NMF) have recently assumed an important role in several fields, such as pattern recognition, automated image exploitation, data clustering and so on. They represent a peculiar tool adopted to obtain a reduced representation of multivariate data by using additive components only, in order to learn parts-based representations of data. All algorithms for computing the NMF are iterative, therefore particular emphasis must be placed on a proper initialization of NMF because of its local convergence. The problem of selecting appropriate starting initialization matrices becomes more complex when data possess special meaning, and this is the case of document clustering. In this paper, we present a new initialization method which is based on the fuzzy subtractive scheme and used to generate initial matrices for NMF algorithms. A preliminary comparison of the proposed initialization with other commonly adopted initializations is presented by considering the application of NMF algorithms in the context of document clustering.
منابع مشابه
A Projected Alternating Least square Approach for Computation of Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a common method in data mining that have been used in different applications as a dimension reduction, classification or clustering method. Methods in alternating least square (ALS) approach usually used to solve this non-convex minimization problem. At each step of ALS algorithms two convex least square problems should be solved, which causes high com...
متن کاملDocument clustering using nonnegative matrix factorization
Amethodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank nonnegative matrix factorization algorithm to retain natural data nonnegativity, thereby eliminating the need to use subtractive basis vector and encoding calculations present in other techniques such as principal compone...
متن کاملDocument clustering using nonnegative matrix factorization q
A methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank nonnegative matrix factorization algorithm to retain natural data nonnegativity, thereby eliminating the need to use subtractive basis vector and encoding calculations present in other techniques such as principal compon...
متن کاملEfficient Document Clustering via Online Nonnegative Matrix Factorizations
In recent years, Nonnegative Matrix Factorization (NMF) has received considerable interest from the data mining and information retrieval fields. NMF has been successfully applied in document clustering, image representation, and other domains. This study proposes an online NMF (ONMF) algorithm to efficiently handle very large-scale and/or streaming datasets. Unlike conventional NMF solutions w...
متن کاملStable Biclustering of Gene Expression Data with Nonnegative Matrix Factorizations
clustering is probably the most frequently used tool for data mining gene expression data, existing clustering approaches face at least one of the following problems in this domain: a huge number of variables (genes) as compared to the number of samples, high noise levels, the inability to naturally deal with overlapping clusters, the instability of the resulting clusters w.r.t. the initializat...
متن کامل